Wie ist die Medienresonanz von Pressemittelungen politischer Parteien?
Assumptions:
Parteien wollen “ihre” Themen in den Medien platzieren, d.h. die Themen, die den - meist programmatisch bestimmten - Kern ihrer Wahlaussage bilden.
Parteien wollen Probleme in der Vordergrund rücken, für die sie nach Ansicht der Bevölkerung insgesamt oder nach Ansicht des eigenen Anhangs die Lösungskompentenz besitzen.
Parteien wollen Themen vermeiden, die aufgrund der aktuellen Sachlage gegen sie sprechen. Stattdessen wollen sie andere Themen (Sachthemen, Personal- und Stilfragen) in den Vordergrund rücken. Instrument hierfür sind Pressemitteilungen der Parteien und Fraktionen.
Parteien möchten, dass ihre Sichtweisen möglichst ungekürzt und unverfälscht publiziert werden.
Parties and candidates not only want to be present in the media (coverage bias), or evaluated in a positive way (tonality bias). They also want the media agenda to be congruent with their own agenda to define the issue-based criteria on which they will be evaluated by voters. Thus, parties choose their issue agenda carefully, highlighting issues that they are perceived to be competent on, that they “own” and that are important to their voters. In that sense agenda bias refers to the extent to which political actors appear in the public domain in conjunction with the topics they wish to emphasize.
To allow for an operationalization of agenda bias, I use parties’ campaign communication as an approximation of the potential universe of news stories (D’Alessio & Allen, 2000; Eberl, 2017). I compare the policy issues addressed in campaign communication (i.e., the party agenda) with the policy issues the parties address in media coverage (i.e., the mediated party agenda).
To discover the latent topics in the corpus of press releases (1.942) and news articles (11.880), a structural topic modeling (STM) developed by Roberts (2016) is applied. The STM is an unsupervised machine learning approach that models topics as multinomial distributions of words and documents as multinomial distributions of topics, allowing to incorporate external variables that effect both, topical content and topical prevalence.
STM assumes a fixed user-specified number of topics. There is not a “right” answer to the number of topics that are appropriate for a given corpus (Grimmer and Stewart 2013), but the function searchK uses a data-driven approach to selecting the number of topics. The function will perform several automated tests to help choose the number of topics including calculating the held out likelihood (Wallach et al. 2009) and performing a residual analysis (Taddy 2012).
I included the document source as a control for the topical topical prevalence, assuming that the distribution of topics depends on the sources. The number of topics is set to 80.
library(stm)
library(tidyverse)
library(ggthemes)
rm(list = ls())
load("../output/models/finalmodel_60.RDa")
model_df <- model_df %>%
dplyr::mutate(doc_index = as.numeric(rownames(.)),
source = ifelse(source == "welt.de", "DIE WELT", source),
source = ifelse(source == "zeit.de", "ZEIT ONLINE", source),
source = ifelse(source == "focus.de", "FOCUS Online", source),
source = ifelse(source == "bild.de", "Bild.de", source),
source = ifelse(source == "spiegel.de", "SPIEGEL ONLINE", source),
source = ifelse(source == "union", "Union", source),
source = ifelse(source == "spd", "SPD", source),
source = ifelse(source == "afd", "AfD", source),
source = ifelse(source == "gruene", "Grüne", source),
source = ifelse(source == "linke", "Linke", source),
source = ifelse(source == "fdp", "FDP", source)
)
model_df %>%
ggplot(aes(source, fill=type)) +
geom_bar(show.legend = F, alpha = 0.8) +
coord_flip() +
facet_wrap(~type, scales = "free") +
labs(title = "Document distribution", y=NULL, x = NULL)
To explore the words associated with each topic we use the words with the highest probability in each topic. As we included the source type (press release or news paper) as a control for the topical content (the word distribution of each topic), we have two different labels for each topic.
sagelabs <- sageLabels(stmOut)
topics.df <- as.data.frame(sagelabs$marginal$prob) %>%
transmute(topic = as.numeric(rownames(.)),
joint_label = paste( "Topic",topic, ":", V1,V2,V3,V4))
topics.df %>% select(joint_label) %>%
htmlTable::htmlTable(align="l", header = c("Topic Label"),
rnames = F)
| Topic Label |
|---|
| Topic 1 : koalition grünen spd schwarz |
| Topic 2 : spd schulz nahles martin |
| Topic 3 : eu europa staaten usa |
| Topic 4 : the is new of |
| Topic 5 : diesel hendricks autos fahrverbote |
| Topic 6 : trump gipfel us hamburg |
| Topic 7 : deutschland merkel regierung land |
| Topic 8 : grünen csu union cdu |
| Topic 9 : afd partei petry fraktion |
| Topic 10 : cdu niedersachsen spd althusmann |
| Topic 11 : prozent spd afd umfrage |
| Topic 12 : euro union milliarden cdu |
| Topic 13 : fragen daten beck bundesregierung |
| Topic 14 : deutschland menschen millionen europa |
| Topic 15 : merkel kanzlerin angela bundeskanzlerin |
| Topic 16 : afd prozent facebook zdf |
| Topic 17 : merkel wahl bundestagswahl afd |
| Topic 18 : csu cdu merkel union |
| Topic 19 : spahn cdu jens präsidiumsmitglied |
| Topic 20 : fdp jamaika grünen lindner |
| Topic 21 : spd nahles frauen andrea |
| Topic 22 : schäuble wolfgang finanzminister spd |
| Topic 23 : kinder eltern familie rente |
| Topic 24 : kohl helmut kanzler kohls |
| Topic 25 : eu europa deutschland europäischen |
| Topic 26 : menschen welt deutschland politik |
| Topic 27 : grünen özdemir partei habeck |
| Topic 28 : afd glaser stiftung partei |
| Topic 29 : spd mecklenburg schwesig vorpommern |
| Topic 30 : spd koalitionsverhandlungen groko union |
| Topic 31 : pflege beck spd prozent |
| Topic 32 : schulz spd martin merkel |
| Topic 33 : ge wulff be ten |
| Topic 34 : jahr zahl deutschland bundesregierung |
| Topic 35 : palmer bosbach jutta maischberger |
| Topic 36 : spd koalition union groko |
| Topic 37 : cdu vw altmaier spd |
| Topic 38 : polizei berliner berlin prozent |
| Topic 39 : afd aufhebung staatsanwaltschaft frauke |
| Topic 40 : bundeswehr soldaten leyen nato |
| Topic 41 : bundestag afd fraktion abgeordneten |
| Topic 42 : gabriel sigmar außenminister deutschland |
| Topic 43 : hamburg polizei gipfel hamburger |
| Topic 44 : baden württemberg fahrt deutschland |
| Topic 45 : cdu schleswig günther alter |
| Topic 46 : gauland afd alexander özoguz |
| Topic 47 : spd bundestag abstimmung union |
| Topic 48 : aachen guttenberg wahlkampf parteien |
| Topic 49 : afd höcke antisemitismus bewegung |
| Topic 50 : scholz hamburg polizei stadt |
| Topic 51 : türkei deutschland erdogan bundesregierung |
| Topic 52 : afd partei hampel pazderski |
| Topic 53 : deutschland flüchtlinge familiennachzug menschen |
| Topic 54 : deutschland bildung bund schulen |
| Topic 55 : weidel afd alice spitzenkandidatin |
| Topic 56 : amri berliner sicherheit anschlag |
| Topic 57 : linke linken spd partei |
| Topic 58 : maizière hamburg innenminister deutschland |
| Topic 59 : seehofer csu söder horst |
| Topic 60 : schmidt glyphosat hendricks spd |
theta <- as.data.frame(stmOut$theta) %>% # get all theta values for each document
mutate(doc_index = as.numeric(rownames(.))) %>%
# convert to long format
gather(topic, theta, -doc_index) %>%
mutate(topic = as.numeric(gsub("V","",topic))) %>%
# join with topic df
left_join(., topics.df, by="topic") %>%
# join with model_df
left_join(., model_df %>%
select(date,type,source,doc_index,title_text), by="doc_index")
For each document, we have a distribution over all topics, e.g.:
sample_doc <- sample(nrow(model_df),1)
# uncomment this to only select docs from press releases
#sample_doc <- theta %>% filter(type=="press") %>% sample_n(1) %>% select(doc_index)
#sample_doc <- sample_doc$doc_index
title <- model_df$title[which(model_df$doc_index == sample_doc)]
source <- model_df$source[which(model_df$doc_index == sample_doc)]
theta %>%
filter(doc_index == sample_doc) %>%
select(doc_index, joint_label, theta) %>%
ggplot(aes(joint_label, theta)) +
geom_col(fill="#0099c6", alpha = 0.8) +
ylim(c(0,1)) +
coord_flip() +
theme_hc() +
labs(title = paste("Topic distribution of document",sample_doc),
subtitle = paste0("Source: ",source,"\nTitle: ", title),
x = NULL, y = NULL
) +
theme(axis.text = element_text(size = 10))
What is the document acutally about?
model_df %>%
filter(doc_index == sample_doc) %>%
select(source, title_text) %>%
htmlTable::htmlTable(align="l", rnames=FALSE, header = c("Source", "Title + Body"))
| Source | Title + Body |
|---|---|
| stern.de | Kahrs: «Im Bund keine Option»: Rufe nach schärferer Absage an Rot-rot-grün in der SPD Kahrs: «Im Bund keine Option»: Rufe nach schärferer Absage an Rot-rot-grün in der SPD 22. Juli 2017 15:35 Uhr Kahrs: «Im Bund keine Option» Rufe nach schärferer Absage an Rot-rot-grün in der SPD Berlin - In der SPD werden zwei Monate vor der Bundestagswahl Rufe nach einer schärferen Absage an eine mögliche Koalition mit Beteiligung der Linkspartei laut. Fullscreen Haushaltsexperte Johannes Kahrs vertritt in der SPD den konservativen Seeheimer Flügel. Foto: Rainer Jensen © dpa-infocom GmbH In der SPD werden zwei Monate vor der Bundestagswahl Rufe nach einer schärferen Absage an eine mögliche Koalition mit Beteiligung der Linkspartei laut. «Rot-Rot-Grün ist in ostdeutschen Ländern möglich, für die Bundesebene ist es keine Option», sagte der Sprecher des konservativen Seeheimer Kreises in der SPD, , der «Welt am Sonntag». Die Linke habe die SPD als Hauptgegner und stehe für keine seriöse Außen- und Wirtschaftspolitik. Der SPD-Bundestagsabgeordnete Christian Flisek sagte der Zeitung: «Die Konflikte in einer Regierung aus SPD, Grünen und Linken wären derzeit immer noch zu groß, um Deutschland stabil zu regieren.» SPD-Bundesvize Ralf Stegner mahnte an, für eine möglichst starke SPD zu kämpfen, was Optionen eröffnen würde. «Koalitionsoptionen anzustreben oder auszuschließen, stärkt nur andere Parteien», schrieb der schleswig-holsteinische SPD-Landeschef am Samstag bei Twitter. Der Chef der SPD-Fraktion in Rheinland-Pfalz, Alexander Schweitzer, sagte der « Welt am Sonntag »: «Wenn eine Dreier-Konstellation notwendig wird, bin ich klar für Rot-Grün-Gelb.» In seinem Land funktioniere die Ampel wunderbar. dpa |
The expected proportion of the corpus that belongs to each topic is used to get an initial overview of the results. The figure below displays the topics ordered by their expected frequency across the corpus. The four most frequent words in each topic are used as a label for that topic.
overall_freq <- as.data.frame(colMeans(stmOut$theta)) %>%
transmute(
topic = as.numeric(rownames(.)),
frequency = colMeans(stmOut$theta)
) %>%
left_join(., topics.df, by = "topic") %>%
arrange(desc(frequency))%>%
mutate(order = row_number())
overall_freq %>%
ggplot(aes(reorder(joint_label, -order),
frequency, fill=frequency)) +
geom_col(show.legend = F) +
coord_flip() +
scale_fill_gradient(low = "#00AFBB", high = "#FC4E07") +
labs(x=NULL, y=NULL)
#ggsave("../figs/topic_proportion.png", height = 6, width = 4)
Agendas were measured in terms of percentage distributions across the 80 topics. For each source the average distribution of each topic is calculated for each month. The following pictures show the overall topic distribution.
# calculate topic mean by source and month
topicmean <- theta %>%
mutate(
year = lubridate::year(date),
month = lubridate::month(date)
) %>%
group_by(topic,source, month, year) %>%
dplyr::summarise(topicmean = mean(theta)) %>%
ungroup() %>%
spread(source, topicmean) %>%
filter(month != 3)
topicmean_news <- theta %>%
filter(type == "news") %>%
group_by(topic,joint_label, source) %>%
summarise(topicmean = mean(theta)) %>%
ungroup()
topicmean_press <- theta %>%
filter(type == "press") %>%
group_by(topic,joint_label, source) %>%
summarise(topicmean = mean(theta)) %>%
ungroup()
topicmean_news %>%
group_by(source) %>%
arrange(desc(topicmean), .by_group = TRUE) %>%
mutate(topic_order = row_number()) %>%
ungroup() %>%
group_by(joint_label) %>%
mutate(topicmean_mean = mean(topicmean)) %>%
ungroup() %>%
top_n(70, topicmean_mean) %>%
ggplot(aes(reorder(joint_label, topicmean_mean),
topicmean, label = topic_order,
fill = topic_order)) +
geom_col(show.legend = F) +
geom_text(hjust=-0.1, size=5) +
coord_flip() +
scale_fill_gradient(low = "#00AFBB", high = "#FC4E07") +
facet_wrap(~source, nrow = 1) +
labs(x=NULL, y=NULL) +
theme(axis.text.y = element_text(size=12))
topicmean_press %>%
group_by(source) %>%
arrange(desc(topicmean), .by_group = TRUE) %>%
mutate(topic_order = row_number()) %>%
ungroup() %>%
group_by(joint_label) %>%
mutate(topicmean_mean = mean(topicmean)) %>%
ungroup() %>%
top_n(50, topicmean_mean) %>%
ggplot(aes(reorder(joint_label, topicmean_mean),
topicmean, label = topic_order,
fill=topic_order)) +
geom_col(show.legend = F) +
geom_text(hjust=-0.1, size=5) +
coord_flip() +
scale_fill_gradient(low = "#00AFBB", high = "#FC4E07") +
facet_wrap(~source, nrow = 1) +
labs(x=NULL, y=NULL) +
theme(axis.text.y = element_text(size=12))
Then, we estimated bivariate correlations between party agendas and the mediated party agendas in the online news. These correlations represent the agenda selectivity each party experiences in each media outlet. The higher the correlation, the more congruent both agendas are.
media <- unique(model_df %>% filter(type == "news") %>% select(source))
parties <- unique(model_df %>% filter(type == "press") %>% select(source))
rm(corrDF)
for (i in parties$source) {
tempdf <- topicmean %>%
group_by(month, year) %>%
do(data.frame(Cor=t(cor(.[,media$source], .[,i])))) %>%
gather(medium, cor, 3:9) %>%
mutate(party = i,
medium = gsub("Cor.","",medium)) %>%
ungroup()
if (exists("corrDF")){
corrDF <- rbind(corrDF,tempdf)
} else {
corrDF <- tempdf
}
}
agenda <- corrDF %>%
mutate(date = as.Date(paste0(year,"/",month,"/1"))) %>%
dplyr::mutate(medium = ifelse(medium == "DIE.WELT", "DIE WELT", medium),
medium = ifelse(medium == "ZEIT.ONLINE", "ZEIT ONLINE", medium),
medium = ifelse(medium == "FOCUS.Online", "FOCUS Online", medium),
medium = ifelse(medium == "SPIEGEL.ONLINE", "SPIEGEL ONLINE", medium)
)
normalize_data <- function(x) {
# normalize data between -1,1
if (is.numeric(x)) {
y <- 2*((x - min(x, na.rm = T)) / (max(x, na.rm = T) - min(x, na.rm = T)))-1
return(y)
} else {
return(x)
}
}
p <- agenda %>%
mutate(
date =as.Date(paste("01",month,year, sep = "-"), format="%d-%m-%Y")
) %>%
ggplot(aes(date, cor, color = medium)) +
geom_line(show.legend = F) +
geom_hline(yintercept = 0, size = 0.3, color = "grey30", linetype = 2) +
facet_wrap(~party) +
labs(y=NULL, x =NULL)
# guides(colour = guide_legend(nrow = 1)) +
# theme(legend.position = "bottom",
# legend.title = element_blank())
plotly::ggplotly(p, tooltip=c("cor","medium"))
agenda %>%
group_by(party, medium) %>%
summarize(cor = mean(cor, na.rm = T)) %>%
spread(key = party, value = cor) %>%
ggiraphExtra::ggRadar(aes(color = medium),
interactive = T,
alpha = 0,
rescale = F,
legend.position = "bottom")